智能论文笔记

Spine-like Joint Link Mechanism to Design Wearable Assistive Devices with Comfort and Support

Jungyeong Kim , Jungsan Cho , Jinhyeon Kim , Jin Tak Kim , Sangchul Han , Sangshin Park , Han Ul Yoon

分类：机器人

2021-11-27

当我们开发可穿戴辅助设备时，舒适和支持是需要考虑的两个主要问题。在传统的设计方法中，穿着者的联合运动自由度往往是超薄的。因此，佩戴者的运动变得克制，并且在意外下降时可能会发生骨/韧带损伤。为了减轻这些问题，这封信提出了一种由人类脊柱结构和功能的新型联合联系机制。所提出的脊柱状关节连杆机构的关键特征是通过柔性合成纤维线连接到半球形块，使得它们的级联刚度可以根据拉伸力调节。这一功能具有设计可穿戴辅助设备的巨大壮大，可通过调节倾斜刚度来支持老年人的坐足式动作或增加脊柱运动。此外，连接的半球形块使得穿着者能够将他/她的关节移动到全部自由度，这反过来增加了穿着者的流动性并阻止联合未对准。使用试验台和试验机的实验结果证实了脊柱状的接头连杆机构可以用作设计可穿戴辅助装置的关键部件，以便更好地移动和安全性。

translated by 谷歌翻译

A Generalized Framework for Critical Heat Flux Detection Using Unsupervised Image-to-Image Translation

Firas Al-Hindawi , Tejaswi Soorib , Han Hu , Md Siddiquee , Hyunsoo Yoon , Teresa Wu , Ying Sun

分类：计算机视觉

2022-12-18

This work proposes a framework developed to generalize Critical Heat Flux (CHF) detection classification models using an Unsupervised Image-to-Image (UI2I) translation model. The framework enables a typical classification model that was trained and tested on boiling images from domain A to predict boiling images coming from domain B that was never seen by the classification model. This is done by using the UI2I model to transform the domain B images to look like domain A images that the classification model is familiar with. Although CNN was used as the classification model and Fixed-Point GAN (FP-GAN) was used as the UI2I model, the framework is model agnostic. Meaning, that the framework can generalize any image classification model type, making it applicable to a variety of similar applications and not limited to the boiling crisis detection problem. It also means that the more the UI2I models advance, the better the performance of the framework.

translated by 谷歌翻译

Efficient and Accurate Quantized Image Super-Resolution on Mobile NPUs, Mobile AI & AIM 2022 challenge: Report

Andrey Ignatov , Radu Timofte , Maurizio Denna , Abdel Younes , Ganzorig Gankhuyag , Jingang Huh , Myeong Kyun Kim , Kihwan Yoon , Hyeon-Cheol Moon , Seungho Lee

分类：计算机视觉

2022-11-07

Image super-resolution is a common task on mobile and IoT devices, where one often needs to upscale and enhance low-resolution images and video frames. While numerous solutions have been proposed for this problem in the past, they are usually not compatible with low-power mobile NPUs having many computational and memory constraints. In this Mobile AI challenge, we address this problem and propose the participants to design an efficient quantized image super-resolution solution that can demonstrate a real-time performance on mobile NPUs. The participants were provided with the DIV2K dataset and trained INT8 models to do a high-quality 3X image upscaling. The runtime of all models was evaluated on the Synaptics VS680 Smart Home board with a dedicated edge NPU capable of accelerating quantized neural networks. All proposed solutions are fully compatible with the above NPU, demonstrating an up to 60 FPS rate when reconstructing Full HD resolution images. A detailed description of all models developed in the challenge is provided in this paper.

translated by 谷歌翻译

Design, Field Evaluation, and Traffic Analysis of a Competitive Autonomous Driving Model in a Congested Environment

Daegyu Lee , Hyunki Seong , Seungil Han , Gyuree Kang , D. Hyunchul Shim , Yoonjin Yoon

分类：机器人

2022-10-31

Recently, numerous studies have investigated cooperative traffic systems using the communication among vehicle-to-everything (V2X). Unfortunately, when multiple autonomous vehicles are deployed while exposed to communication failure, there might be a conflict of ideal conditions between various autonomous vehicles leading to adversarial situation on the roads. In South Korea, virtual and real-world urban autonomous multi-vehicle races were held in March and November of 2021, respectively. During the competition, multiple vehicles were involved simultaneously, which required maneuvers such as overtaking low-speed vehicles, negotiating intersections, and obeying traffic laws. In this study, we introduce a fully autonomous driving software stack to deploy a competitive driving model, which enabled us to win the urban autonomous multi-vehicle races. We evaluate module-based systems such as navigation, perception, and planning in real and virtual environments. Additionally, an analysis of traffic is performed after collecting multiple vehicle position data over communication to gain additional insight into a multi-agent autonomous driving scenario. Finally, we propose a method for analyzing traffic in order to compare the spatial distribution of multiple autonomous vehicles. We study the similarity distribution between each team's driving log data to determine the impact of competitive autonomous driving on the traffic environment.

translated by 谷歌翻译

K-MHaS: A Multi-label Hate Speech Detection Dataset in Korean Online News Comment

Jean Lee , Taejun Lim , Heejun Lee , Bogeun Jo , Yangsok Kim , Heegeun Yoon , Soyeon Caren Han

分类：自然语言处理 | 人工智能

2022-08-23

在线仇恨言语检测已随着数字设备的增长而变得重要，但是英语以外的其他语言资源非常有限。我们介绍了K-MHAS，这是一种新的多标签数据集，用于仇恨言语检测，可有效处理韩国语言模式。该数据集由新闻评论中的109k话语组成，并提供了从1到4个标签的多标签分类，并处理主观性和相交性。我们评估了K-MHAS上强的基线。Kr-Bert带有子字符的代币器优于表现，在每个仇恨言论类中都认识到分解的角色。

translated by 谷歌翻译

Injecting 3D Perception of Controllable NeRF-GAN into StyleGAN for Editable Portrait Image Synthesis

Jeong-gi Kwak , Yuanming Li , Dongsik Yoon , Donghyeon Kim , David Han , Hanseok Ko

分类：计算机视觉

2022-07-21

多年来，2d Gans在影像肖像的一代中取得了巨大的成功。但是，他们在生成过程中缺乏3D理解，因此他们遇到了多视图不一致问题。为了减轻这个问题，已经提出了许多3D感知的甘斯，并显示出显着的结果，但是3D GAN在编辑语义属性方面努力。 3D GAN的可控性和解释性并未得到太多探索。在这项工作中，我们提出了两种解决方案，以克服2D GAN和3D感知gan的这些弱点。我们首先介绍了一种新颖的3D感知gan，Surf-Gan，它能够在训练过程中发现语义属性，并以无监督的方式控制它们。之后，我们将先验的Surf-GAN注入stylegan，以获得高保真3D控制的发电机。与允许隐姿姿势控制的现有基于潜在的方法不同，所提出的3D控制样式gan可实现明确的姿势控制对肖像生成的控制。这种蒸馏允许3D控制与许多基于样式的技术（例如，反转和风格化）之间的直接兼容性，并且在计算资源方面也带来了优势。我们的代码可从https://github.com/jgkwak95/surf-gan获得。

translated by 谷歌翻译

Generate and Edit Your Own Character in a Canonical View

Jeong-gi Kwak , Yuanming Li , Dongsik Yoon , David Han , Hanseok Ko

分类：计算机视觉

2022-05-06

最近，从单个用户构成的肖像中综合个性化角色已引起了极大的关注，因为社交媒体和元媒体的急剧普及。输入图像并不总是在正面视图中，因此对于3D建模或其他应用程序，获取或预测规范视图很重要。尽管生成模型的进度可以使肖像的风格化，但在规范视图中获得风格化的图像仍然是一项艰巨的任务。有几项关于面部额叶化的研究，但是当不在真实图像域中，例如卡通或绘画中，它们的性能会显着降低。额叶化后的样式化也导致退化的输出。在本文中，我们提出了一个新颖而统一的框架，该框架在规范视图中生成了风格化的肖像。借助提出的潜在映射器，我们分析和发现在Stylegan潜在空间中的额叶化映射，以立即进行风格化和正面化。此外，我们的模型可以使用未标记的2D图像集对培训，而无需任何3D监督。实验结果证明了我们方法的有效性。

translated by 谷歌翻译

TinyMIM: An Empirical Study of Distilling MIM Pre-trained Models

Sucheng Ren , Fangyun Wei , Zheng Zhang , Han Hu

分类：计算机视觉

2023-01-03

Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.

translated by 谷歌翻译

Reference Twice: A Simple and Unified Baseline for Few-Shot Instance Segmentation

Yue Han , Jiangning Zhang , Zhucun Xue , Chao Xu , Xintian Shen , Yabiao Wang , Chengjie Wang , Yong Liu , Xiangtai Li

分类：计算机视觉

2023-01-03

Few Shot Instance Segmentation (FSIS) requires models to detect and segment novel classes with limited several support examples. In this work, we explore a simple yet unified solution for FSIS as well as its incremental variants, and introduce a new framework named Reference Twice (RefT) to fully explore the relationship between support/query features based on a Transformer-like framework. Our key insights are two folds: Firstly, with the aid of support masks, we can generate dynamic class centers more appropriately to re-weight query features. Secondly, we find that support object queries have already encoded key factors after base training. In this way, the query features can be enhanced twice from two aspects, i.e., feature-level and instance-level. In particular, we firstly design a mask-based dynamic weighting module to enhance support features and then propose to link object queries for better calibration via cross-attention. After the above steps, the novel classes can be improved significantly over our strong baseline. Additionally, our new framework can be easily extended to incremental FSIS with minor modification. When benchmarking results on the COCO dataset for FSIS, gFSIS, and iFSIS settings, our method achieves a competitive performance compared to existing approaches across different shots, e.g., we boost nAP by noticeable +8.2/+9.4 over the current state-of-the-art FSIS method for 10/30-shot. We further demonstrate the superiority of our approach on Few Shot Object Detection. Code and model will be available.

translated by 谷歌翻译

Muse: Text-To-Image Generation via Masked Generative Transformers

Huiwen Chang , Han Zhang , Jarred Barber , AJ Maschinot , Jose Lezama , Lu Jiang , Ming-Hsuan Yang , Kevin Murphy , William T. Freeman , Michael Rubinstein

分类：计算机视觉 | 人工智能 | 机器学习

2023-01-02

We present Muse, a text-to-image Transformer model that achieves state-of-the-art image generation performance while being significantly more efficient than diffusion or autoregressive models. Muse is trained on a masked modeling task in discrete token space: given the text embedding extracted from a pre-trained large language model (LLM), Muse is trained to predict randomly masked image tokens. Compared to pixel-space diffusion models, such as Imagen and DALL-E 2, Muse is significantly more efficient due to the use of discrete tokens and requiring fewer sampling iterations; compared to autoregressive models, such as Parti, Muse is more efficient due to the use of parallel decoding. The use of a pre-trained LLM enables fine-grained language understanding, translating to high-fidelity image generation and the understanding of visual concepts such as objects, their spatial relationships, pose, cardinality etc. Our 900M parameter model achieves a new SOTA on CC3M, with an FID score of 6.06. The Muse 3B parameter model achieves an FID of 7.88 on zero-shot COCO evaluation, along with a CLIP score of 0.32. Muse also directly enables a number of image editing applications without the need to fine-tune or invert the model: inpainting, outpainting, and mask-free editing. More results are available at https://muse-model.github.io

translated by 谷歌翻译